Stemming Strategies for European Languages
نویسنده
چکیده
In this paper, we describe and evaluate different general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemming approaches are quite effective for the French, Portuguese and Hungarian languages, and perform reasonably well for the German language. Variations in mean average precision among the different stemmers are also evaluated and are sometimes found to be statistically significant.
منابع مشابه
Monolingual Document Retrieval: English versus other European Languages
The vast majority of research in information retrieval is done using English collections and topics. This raises questions about the effectiveness of retrieval strategies for other languages. To examine this issue, we focus on document retrieval in nine European languages. In particular, we investigate the effectiveness of language-dependent approaches to document retrieval, such as stemming an...
متن کاملLexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003
Hummingbird participated in the monolingual information retrieval tasks of the Cross-Language Evaluation Forum (CLEF) 2003: for natural language queries in 9 European languages (German, French, Italian, Spanish, Dutch, Finnish, Swedish, Russian and English) find all the relevant documents (with high precision) in the CLEF 2003 document sets. For each language, SearchServer scored higher than th...
متن کاملData Fusion for Effective European Monolingual Information Retrieval
For our fourth participation in the CLEF evaluation campaigns, our first objective was to propose an effective and general stopword list and a light stemming procedure for the Portuguese language. Our second objective was to obtain a better picture of the relative merit of various search engines when processing documents in the Finnish and Russian languages. Finally, based on the Z-score method...
متن کاملTerrier takes on the non-English Web
The aim of this work is to identify how standard Information Retrieval (IR) techniques can be adapted in Web retrieval for non-English queries. In particular, we address the challenge of stemming queries and documents in a multilingual setting. Experiments with a multilingual collection of over 20 languages, more than 800 queries, and various stemming strategies in these languages reveal that u...
متن کاملStemming Approaches for East European Languages
During this CLEF evaluation campaign, the first objective is to propose and evaluate various indexing and search strategies for the Czech language that will hopefully result in more effective retrieval than language-independent approaches (n-gram). Based on the stemming strategy we developed for other languages, we propose that for the Slavic language a light stemmer (inflectional only) and als...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010